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SUMMARY 

Influenza-like illnesses (ILIs) are caused by several respiratory pathogens. These pathogens 
show weak to strong seasonal activity implying seasonality in ILI consultations. In this paper, 
the contribution of pathogens to seasonality of ILI consultations was statistically modelled. 
Virological count data were first smoothed using modulation models for seasonal time series. 
Second, Poisson regression was used regressing ILI consultation counts on the smoothed time 
series. Using ratios of the estimated regression parameters, relative measures of the under- 
reporting of pathogens were obtained. Influenza viruses A and B, parainfluenza virus and 
respiratory syncytial virus (RSV) significantly contributed to explain the seasonal variation in 
ILI consultations. We also found that RSV was the least and influenza virus A is the most 
underreported pathogen in Belgian laboratory surveillance. The proposed methods and results 
are helpful in interpreting the data of clinical and laboratory surveillance, which are the essential 
parts of influenza surveillance. 
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INTRODUCTION 

Influenza is a common infectious disease, which has 
an important impact on society each year [1]. The 
typical clinical features of influenza disease include 
fever, respiratory symptoms, headache, muscle ache 
and fatigue [2]. In most cases, the influenza disease 
is self-limiting but it can evolve to life-threatening 
medical complications [3]. Recently, influenza has 
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been identified as one of the three infectious diseases 
causing the highest burden in Europe, along with HIV 
infection and tuberculosis [4]. Moreover, genetic re- 
assortments and mutations of influenza viruses might 
lead to the emergence of pandemics during which the 
rates of morbidity and mortality increase further. 

Influenza surveillance is implemented by many 
national and international authorities throughout 
the world [5, 6]. The World Health Organization 
(WHO) stresses the importance of influenza surveil- 
lance activities for the annual determination of influ- 
enza vaccine content and as an indispensable tool 
for pandemic preparedness [7]. A standard tool for 
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monitoring influenza activity is the combination 
of virological and clinical surveillance by a network of 
sentinel practitioners [5, 6]. As a tool for detection of 
the first circulating viruses, virological surveillance 
allows the characterization of strains by monitoring 
the rates of influenza virus positivity. Clinical sur- 
veillance is based on consultations for influenza- 
like illness (ILI), which is a clinical diagnosis of a set 
of common aspecific symptoms. These symptoms 
include typical clinical features of influenza, although 
heterogeneous case definitions are used [8]. The com- 
bination of virological and clinical surveillance is 
generally considered to be the most accurate tool for 
monitoring influenza activity [9]. 

Respiratory pathogens other than influenza 
are generally not monitored by combined influenza 
surveillance [5, 6]. However, such pathogens might 
also cause ILI, resulting in poor to moderate positive 
predictive values of ILI diagnoses of laboratory- 
confirmed influenza infections [10-12]. In particular, 
along with influenza viruses A and B, parainfluenza 
virus, respiratory syncytial virus (RSV), adenovirus 
and Mycoplasma pneumoniae are regarded as other 
important respiratory pathogens with the potential to 
cause ILI. For most of these respiratory pathogens 
seasonality has been consistently observed, although 
the driving mechanisms are still poorly understood 
[13]. A typical example of a seasonal infectious disease 
is influenza. Annual influenza epidemics commonly 
occur during the winter season in temperate regions of 
the world with varying onset, duration and severity 
[14]. Moreover, the incidence of RSV varies con- 
spicuously by season, showing distinct seasonal pat- 
terns in different countries [15, 16]. Such seasonality 
in pathogen activity naturally implies seasonality in 
ILI consultations. 

In this study, the pathogens' contribution to sea- 
sonal variation in ILI was statistically modelled, using 
data from two independent surveillance systems in 
Belgium. Data from both clinical sentinel surveillance 
[17], and laboratory sentinel surveillance were used in 
monitoring trends of different respiratory pathogens 
[18]. The pathogens' contribution to the seasonality 
of ILI was estimated using smooth modulation 
models for seasonal time series [19] and Poisson 
models regressing the number of ILI consultations in 
the number of laboratory reports for various respir- 
atory pathogens. Epidemiological interpretations in 
terms of relative measures of underreported patho- 
gens were obtained by using ratios of estimated 
Poisson regression parameters. 



METHODOLOGY 
Data 

Clinical surveillance 

The clinical data on ILI consultations from January 
2004 to December 2008 were extracted from the 
General Practitioners (GPs) influenza surveillance 
database, which is obtained through a weekly regis- 
tration network of GPs coordinated by the Belgian 
Scientific Institute of Public Health (WIV-ISP) 
[17]. This database contains, among others, weekly 
information on the number of ILI consultations 
with the case definition for ILI being sudden onset 
of illness, associated with fever, respiratory and gen- 
eral symptoms. Since October 2007, data have 
been collected by the Belgian sentinel GPs network, 
in which about 180 GPs participate. The participating 
GPs cover 1.75% of the total Belgian patient popu- 
lation and are representative of the profile of 
family physicians in Belgium in terms of age, sex 
and geographical location [20]. Before October 
2007, data were collected by a smaller network of 
40-80 GPs. 

The counts of ILI consultations were extrapolated 
to the whole Belgian population to adjust for changes 
in the size of the represented patient population as a 
result of changes in the number of GPs reporting over 
time. In total, data for 214 measurements were avail- 
able. For the years preceding 2007, ILI consultations 
were not monitored outside the influenza season, re- 
sulting in incomplete time series. 

Laboratory surveillance 

The sentinel laboratory network, coordinated by 
WIV-ISP, has collected data on about 40 infectious 
diseases since 1983 [18]. In 2009, 100 laboratories, re- 
presenting 58% of all Belgian laboratories, partici- 
pated to the surveillance system on a voluntary basis. 
The participating private or hospital laboratories are 
evenly distributed over 33 out of 43 administrative 
districts in Belgium. These laboratories receive bio- 
logical samples from routine diagnostic testing at GP 
practices, hospitals, care homes, etc. On a weekly ba- 
sis, the laboratories send anonymized data to WIV- 
ISP using an electronic system (Epi-Lab), internet 
application or registration form. The incidence of 
different infections, which includes respiratory infec- 
tions, is monitored using this surveillance system, 
allowing for the detection of changes in time or geo- 
graphical trends. 
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Data on all pathogens available that potentially 
cause ILI were extracted from the Belgian sentinel 
laboratory surveillance database. In particular, data 
on the weekly number of samples that tested positive 
for influenza virus A, influenza virus B, parainfluenza, 
RSV and M. pneumoniae were obtained for the period 
from January 2004 to December 2008, resulting in 260 
measurement points for each of the five pathogens as 
the time series are complete. 

Data analysis 

Modulation models for seasonal time series 

The clinical and five virological time series were first 
smoothed, with the aim of revealing the essential 
(non-parametric) patterns while suppressing excessive 
variations. Smoothing techniques are increasingly 
popular because they provide a statistical tool to 
graphically explore the data and allow modelling of 
the data when classical parametric models fail [21]. 
Because the virological and clinical time series exhibit 
irregular seasonal variation, the time trends were 
smoothed using modulation models for seasonal time 
series [19]. In these models, the overall time trend is 
modelled using an intercept and the periodicity is 
modelled using sine and cosine regressors. The coeffi- 
cients of the intercept, sine and cosine regressors are 
allowed to vary smoothly over time. This permits the 
modelling of global time trends and varying onset, 
duration and severity of incidence peaks over time 
(for details, see Eilers et al. [19]). Because the clinical 
data X is a time series of counts exhibiting over- 
dispersion, the Poisson quasi-likelihood with log-link 
and deviance-based correction for overdispersion was 
used [19]. In particular, the Poisson expectation Jwas 
modelled as a smooth function of time t using a basis 
of 30 B splines of third degree for the intercept, sine 
and cosine regressors and second-order smoothness 
penalties. The optimal smoothness parameters were 
selected using quasi- Akaike's Information Criteria 
[19]. For each of the five respiratory pathogens, 
smooth functions Y t with i = 1 , 2, . . . , 5 were obtained 
similarly. 

Multiple Poisson regression 

Second, the ILI consultation counts X were linearly 
regressed on the smoothed predictions of the five 
respiratory pathogens, Y l9 Y 2 , Y 5 , to assess the 
pathogens' contribution to the seasonal variation in 
ILI. To this end, the Poisson quasi-likelihood with 



deviance-based correction for overdispersion and 
identity link was used, as it had the expected ILI 
counts 

i=l 

Although the log-link is the natural link for Poisson 
regression [22], the identity link g was used to obtain 
epidemiological interpretations of the estimated 
Poisson parameters a u which is explained below. 

Epidemiological interpretation of parameters 

Introducing some notation, we allow A^(/)ili to denote 
the total number of ILI cases in a given population 
as a function of time t. Similarly, we denote the total 
number of illness cases due to influenza virus A, 
influenza virus B, parainfluenza virus, RSV and 
M. pneumoniae as 7V(0inflA, MOinfiB, MOpam, MOrsv 
and Af(0myco ? respectively. Then, assuming that no 
other pathogens are causing ILI, it immediately fol- 
lows that 

MOlLI = MOinflA + MOinflB + MOpara 

+ M0RSV+M0myco- (2) 

However, the total number of cases N(t) in a given 
population is typically unknown as a result of under- 
reporting. Instead, the number of reported cases R(t) 
is observed. Assuming that the reporting probability 
it is constant over time, it follows that R(t) = JtN(t). 
Hence, rewriting equation (2) in terms of the number 
of reported cases R(t) assuming disease- or pathogen- 
specific reporting probabilities gives 

R(t) lu = R(t) mf[A | R(t) mm | g(Qpara 
^ILI ^inflA ^inflB ^para 

+ gCpRSV + gCOmyco (3) 
^RSV ^myco 

with, e.g. R(t)iLi being the number of reported ILI 
cases at time t and tt IL i being the probability of re- 
porting an ILI case. Rewriting again and subse- 
quently simplifying, it follows that 

^(OlLI = «inflA^(0inlfA + «inflB^(0inflB + «para^(0para 

+ «RSV^(0rSV + «myco^(0myco' (4) 

Where, ^iLlMnflA = «inflA ? ^ILlMnflB = ^inflB, etc. It 

should be noted that equation (4) is of the same form 
as equation (1), implying that the parameters a can be 
estimated as explained above. The additivity of the 
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Fig. 1. Weekly number of laboratory reports of (a) influenza virus A, (b) influenza virus B, (c) parainfluenza, (d) RSV, 
(e) Mycoplasma pneumoniae and if) weekly numbers of influenza-like illness (ILI) with corresponding smoothed time series. 



model given in equation (4) also explains the choice of 
the identity link. Indeed, using the identity link in 
Poisson regression gives rise to an additive interpret- 
ation of the parameters a whereas the commonly 
used log-link gives rise to a multiplicative interpret- 
ation [22]. 

Finally, by using ratios of the parameters a, inter- 
esting epidemiological interpretations were obtained. 
For instance, take (arbitrarily) the parameter a RS v 
as reference and construct, for the remaining para- 
meters, ratios relative to that reference. For instance, 
construct O inflA = a inflA /a RSV , which is straight- 
forwardly rewritten using the definitions in ex- 
pression (4) as 



inflA = " 



OinflA ^ILI / ^inflA 1 / ^inflA _ <P 



inflA 



(5) 



«RSV ^ILl/^RSV 1/^RSV 0RSV 

where 1 /7T infl A = (PinflA is the factor needed to correct 
for underreporting of diseases due to influenza A and 
similarly, 1/tTrsv = 9rsv is the factor needed to 
correct for underreporting of diseases due to RSV. 
Hence, O inflA should be interpreted as the factor 
needed to correct for underreporting of influenza A 



diseases relative to the factor needed to correct for 
underreporting of RSV. 



RESULTS 
Data smoothing 

From the laboratory reports, RSV (54-42%) was the 
most commonly reported pathogen during 2004-2008, 
consecutively followed by M. pneumoniae (31-52%), 
influenza virus A (7-10%), parainfluenza virus 
(4-79 %) and influenza virus B (2-20 %). Figure l(a-e) 
presents the weekly number of laboratory reports of 
influenza virus A, influenza virus B, parainfluenza 
virus, RSV, and M. pneumoniae, respectively, together 
with the smoothed time series and 95% confidence 
intervals. Clearly, strong seasonality can be observed 
for influenza virus A, influenza virus B and RSV with 
the RSV peaks preceding those of influenza viruses A 
and B. Weaker seasonality can be observed for para- 
influenza and M. pneumoniae with the latter showing 
a clearly decreasing trend over time. Figure If pre- 
sents the weekly number of ILI consultations, 
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Table 1. Results of the multiple Poisson regression: regression parameters a and ratios O of factors to correct for 
underreporting with respiratory syncytial virus as reference 





CL 






O 




Respiratory pathogen 


Estimate 


(95 % CI) 


P value 


Estimate 


(95 % CI) 


Influenza virus A 


449.9 


(387-9 to 512-0) 


0-008 


60-6 


(26-0 to 95-1) 


Influenza virus B 


205-6 


(54-3 to 357-0) 


0-001 


27-7 


(6-3 to 78-5) 


Parainfluenza 


118-6 


(48-1 to 189-2) 


0-029 


15-9 


(5-36 to 43-5) 


Respiratory syncytial virus 


7-4 


(3-4 to 11-5) 


<0-001 


1 




Mycoplasma pneumoniae 


7-7 


(-10-4 to 25-8) 


0-404 


1-04 


(-2-21 to 3-10) 



also showing strong seasonality, that most closely 
coincides with the seasonal patterns of the influenza 
viruses. 

Multiple Poisson regression 

The results of the multiple Poisson model regressing 
the ILI consultation counts on the smoothed time 
series of influenza virus A, influenza virus B, parain- 
fluenza, RSV and M. pneumoniae are given in Table 1 . 
As can be seen, all respiratory pathogens except 
M. pneumoniae, significantly contribute in explaining 
the seasonal variation in ILI consultations. The re- 
sults for the ratios O of factors correcting for under- 
reporting with RSV as reference are given in the last 
two columns Table 1. The 95% confidence intervals 
are obtained using Fieller's method [23]. The ratios O 
indicate that diseases due to RSV were the least 
underreported by Belgian laboratory surveillance 
whereas diseases due to influenza viruses A and B 
were the most underreported. 

Figure 2 gives a graphical representation of the 
Poisson regression model given in equation (4). The 
smoothed time series of the respiratory pathogens, Y f 
(7=1, 2, 5), are jointly presented in Figure 2a. 
To predict the ILI consultations, the smoothed time 
series are first rescaled using regression weights a, 
(Fig. 2b). Then these rescaled time series a z F, are 
summed to predict the ILI consultation counts. The 
predicted curve and its 95 % confidence interval are 
presented by the dark grey area in Figure 2c. As can 
be seen from Figure 2(b, c), the peaks in ILI con- 
sultations are mainly explained by influenza virus A 
and, to a lesser extent, by influenza virus B. Further- 
more, Figure 2(b, c) suggests that the excess in ILI 
consultations before the onset of the influenza epi- 
demic is mainly explained by RSV. By means of 
comparison, the smoothed time series of ILI con- 
sultations X is also presented in Figure 2 c (light grey 



area). As can be seen, both the smoothed ILI curve as 
well as the ILI curve, as predicted based on the 
smoothed time series of the respiratory pathogens, are 
nicely overlapping. This observation is well in line 
with the obtained pseudo-7? 2 value for the over- 
dispersed Poisson regression model [24], i.e. 7? 2 = 082, 
indicating that ILI seasonality is well predicted by the 
seasonality of the respiratory pathogens. 

DISCUSSION 

In this study, the contribution of respiratory patho- 
gens to the seasonal variation in ILI consultations was 
statistically modelled using data from the Belgian 
clinical and laboratory sentinel surveillance systems, 
which are two independent surveillance systems. The 
statistical methods were smooth modulation models 
for seasonal time series and Poisson regression with 
correction for overdispersion. 

Methods regressing syndromic incidence data on 
the number of laboratory reports have been used 
previously. Linear regression methods have been 
used, among others, to assess the burden of influenza 
in terms of general practice consultations, hospital 
admissions and deaths [25], in order to estimate the 
contribution of different respiratory pathogens to the 
seasonality of NHS Direct respiratory calls [26] and to 
validate other syndromic surveillance systems (e.g. 
absenteeism, pharmacy sales, laboratory submissions) 
for their capability of capturing respiratory pathogen 
activity [27]. More evolved regression methods have 
been used recently by Yang et al. [28], who used 
wavelet analysis to investigate the synchrony of clini- 
cal and laboratory surveillance in Hong Kong. The 
method we propose has the advantage of providing 
solid epidemiological interpretations. By using ratios 
of the estimated regression parameters, relative fac- 
tors of disease underreporting by laboratory surveil- 
lance were obtained. Furthermore, the method allows 
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Fig. 2. Graphical presentation of the multiple Poisson regression: (a) smoothed time series of respiratory pathogens, 
(b) rescaled time series of respiratory pathogens, (c) smoothed and predicted time series of influenza-like illness. 
RSV, Respiratory syncytial virus. 



interesting and interpretable visualizations of the 
model results. 

The model results indicate that, in line with pre- 
vious research, significant contributions were found 
for influenza viruses A and B, parainfluenza virus and 
RSV [12]. The contribution of M. pneumoniae was not 
found to be significant. The peaks of ILI consulta- 
tions were mainly explained by influenza virus A and, 



to a lesser extent, by influenza virus B, whereas 
the excess in ILI consultations prior to the onset of 
the influenza epidemic was explained by RSV. A sig- 
nificant year round contribution was found for para- 
influenza. By using ratios of the estimated regression 
parameters, we found that diseases due to RSV and 
M. pneumoniae were the least underreported by 
Belgian laboratory surveillance whereas diseases due 
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to influenza viruses A and B were the most under- 
reported. These large differences in relative measures 
of underreporting are due to case ascertainment bias 
and can be interpreted as a reflection of medical 
practice in Belgium. For instance, causes of childhood 
diseases are frequently tested, as a cautious principle 
of sampling is often adopted for young patients. RSV 
is such a childhood disease. Furthermore, the costs 
of RSV testing for children aged <2 years are reim- 
bursed by compulsory Belgian medical insurance, 
explaining the (relatively) small amount of RSV un- 
derreporting. On the other hand, as ILI is a clinically 
based diagnosis with a symptom-related treatment, 
its causes are rarely tested during the influenza 
season, which explains the (relatively) large amount of 
underreporting for influenza viruses A and B. Causes 
of respiratory infections outside the influenza season 
could be more frequently tested, explaining the 
(relatively) small amount of underreporting for 
M. pneumoniae, being a non-seasonal virus circulating 
throughout the year. 

The proposed regression model provides a good fit, 
indicating that ILI seasonality is well predicted by the 
seasonality of respiratory pathogens. This can also be 
regarded as a mutual validation of the independent 
clinical and laboratory surveillance systems. The 
model relies on two important assumptions. First, it 
is assumed that the pathogen-specific reporting prob- 
abilities are constant over time. This assumption 
seems epidemiological^ plausible and, moreover, 
is hard to relax as it could lead to non-identifiable 
regression models. The second assumption that all 
ILI cases are caused by a limited set of respiratory 
pathogens (i.e. influenza virus A, influenza virus B, 
parainfluenza virus, RSV, M. pneumoniae) is obviously 
not correct. However, other pathogens with the 
potential to cause ILI are not monitored by Belgian 
laboratory surveillance and hence, could not be in- 
cluded in the regression model. Instead, an intercept 
might be included to implicitly account for the patho- 
gens for which no or only limited information is 
available. However, this assumes that the contri- 
bution of these unknown or missing pathogens to ILI 
consultations is constant over time, which is clearly 
not the case. By excluding the intercept, as done in the 
current study, the model predictions are likely to 
locally underestimate the observed number of ILI 
consultations. These underestimations are informa- 
tive, suggesting the activity of an unknown or missing 
pathogen. Future research might attempt to discover 
an explanation for the observed underestimation 



using other databases or published studies. For the 
Belgian data, such an underestimation was observed 
prior to the influenza epidemic of 2008 (see Fig. 2 c), 
but could not be explained. 

To conclude, the seasonality of ILI is well predicted 
by the seasonality of influenza viruses A and B, 
parainfluenza and RSV. In addition, relative factors 
of underreporting of respiratory pathogens in lab- 
oratory surveillance have been obtained indicating 
that RSV is the least and influenza A is the most un- 
derreported pathogen in Belgian laboratory surveil- 
lance. The results of this study are helpful in 
interpreting the data of clinical and laboratory sur- 
veillance, which are the essential parts of influenza 
sentinel surveillance. The proposed methods provide 
interesting epidemiological interpretations and are 
versatile. Future research might include an extension 
of the current analysis by including additional cov- 
ariate information such as age and geographical 
location. Furthermore, although not explicitly inves- 
tigated in this paper, the smooth modulation models 
for seasonal time series [19] allow the modelling of 
varying onset, duration and severity of the incidence 
peaks over time. Such an approach would yield in- 
teresting insights into the temporal variation in viral 
agents [29] and disease dynamics. 
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